54 research outputs found
Coarse-Graining Auto-Encoders for Molecular Dynamics
Molecular dynamics simulations provide theoretical insight into the
microscopic behavior of materials in condensed phase and, as a predictive tool,
enable computational design of new compounds. However, because of the large
temporal and spatial scales involved in thermodynamic and kinetic phenomena in
materials, atomistic simulations are often computationally unfeasible.
Coarse-graining methods allow simulating larger systems, by reducing the
dimensionality of the simulation, and propagating longer timesteps, by
averaging out fast motions. Coarse-graining involves two coupled learning
problems; defining the mapping from an all-atom to a reduced representation,
and the parametrization of a Hamiltonian over coarse-grained coordinates.
Multiple statistical mechanics approaches have addressed the latter, but the
former is generally a hand-tuned process based on chemical intuition. Here we
present Autograin, an optimization framework based on auto-encoders to learn
both tasks simultaneously. Autograin is trained to learn the optimal mapping
between all-atom and reduced representation, using the reconstruction loss to
facilitate the learning of coarse-grained variables. In addition, a
force-matching method is applied to variationally determine the coarse-grained
potential energy function. This procedure is tested on a number of model
systems including single-molecule and bulk-phase periodic simulations.Comment: 8 pages, 6 figure
Simulations with machine learning potentials identify the ion conduction mechanism mediating non-Arrhenius behavior in LGPS
LiGe(PS) (LGPS) is a highly concentrated solid electrolyte, in
which Coulombic repulsion between neighboring cations is hypothesized as the
underlying reason for concerted ion hopping, a mechanism common among
superionic conductors such as LiLaZrO (LLZO) and
LiAlTi(PO) (LATP). While first principles
simulations using molecular dynamics (MD) provide insight into the Li
transport mechanism, historically, there has been a gap in the temperature
ranges studied in simulations and experiments. Here, we used a neural network
(NN) potential trained on density functional theory (DFT) simulations, to run
up to 40-nanosecond long MD simulations at DFT-like accuracy to characterize
the ion conduction mechanisms across a range of temperatures that includes
previous simulations and experimental studies. We have confirmed a Li
sublattice phase transition in LGPS around 400 K, below which the
\textit{ab}-plane diffusivity is drastically reduced. Concomitant
with the sublattice phase transition near 400 K, there is less cation-cation
(cross) correlation, as characterized by Haven ratios closer to 1, and the
vibrations in the system are more harmonic at lower temperature. Intuitively,
at high temperature, the collection of vibrational modes may be sufficient to
drive concerted ion hops. However, near room temperature, the vibrational modes
available may be insufficient to overcome electrostatic repulsion, thus
resulting in less correlated ion motion and comparatively slower ion
conduction. Such phenomena of a sublattice phase transition, below which
concerted hopping plays a less significant role, may be extended to other
highly concentrated solid electrolytes such as LLZO and LATP
Learning Pair Potentials using Differentiable Simulations
Learning pair interactions from experimental or simulation data is of great
interest for molecular simulations. We propose a general stochastic method for
learning pair interactions from data using differentiable simulations
(DiffSim). DiffSim defines a loss function based on structural observables,
such as the radial distribution function, through molecular dynamics (MD)
simulations. The interaction potentials are then learned directly by stochastic
gradient descent, using backpropagation to calculate the gradient of the
structural loss metric with respect to the interaction potential through the MD
simulation. This gradient-based method is flexible and can be configured to
simulate and optimize multiple systems simultaneously. For example, it is
possible to simultaneously learn potentials for different temperatures or for
different compositions. We demonstrate the approach by recovering simple pair
potentials, such as Lennard-Jones systems, from radial distribution functions.
We find that DiffSim can be used to probe a wider functional space of pair
potentials compared to traditional methods like Iterative Boltzmann Inversion.
We show that our methods can be used to simultaneously fit potentials for
simulations at different compositions and temperatures to improve the
transferability of the learned potentials.Comment: 12 pages, 10 figure
Chemistry-informed Macromolecule Graph Representation for Similarity Computation and Supervised Learning
Macromolecules are large, complex molecules composed of covalently bonded
monomer units, existing in different stereochemical configurations and
topologies. As a result of such chemical diversity, representing, comparing,
and learning over macromolecules emerge as critical challenges. To address
this, we developed a macromolecule graph representation, with monomers and
bonds as nodes and edges, respectively. We captured the inherent chemistry of
the macromolecule by using molecular fingerprints for node and edge attributes.
For the first time, we demonstrated computation of chemical similarity between
2 macromolecules of varying chemistry and topology, using exact graph edit
distances and graph kernels. We also trained graph neural networks for a
variety of glycan classification tasks, achieving state-of-the-art results. Our
work has two-fold implications - it provides a general framework for
representation, comparison, and learning of macromolecules; and enables
quantitative chemistry-informed decision-making and iterative design in the
macromolecular chemical space.Comment: Main text: 4 pages, 2 figures, 1 table; Appendix: 18 pages, 25
figures, 3 table
Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks
Neural network (NN) interatomic potentials provide fast prediction of
potential energy surfaces, closely matching the accuracy of the electronic
structure methods used to produce the training data. However, NN predictions
are only reliable within well-learned training domains, and show volatile
behavior when extrapolating. Uncertainty quantification approaches can flag
atomic configurations for which prediction confidence is low, but arriving at
such uncertain regions requires expensive sampling of the NN phase space, often
using atomistic simulations. Here, we exploit automatic differentiation to
drive atomistic systems towards high-likelihood, high-uncertainty
configurations without the need for molecular dynamics simulations. By
performing adversarial attacks on an uncertainty metric, informative geometries
that expand the training domain of NNs are sampled. When combined to an active
learning loop, this approach bootstraps and improves NN potentials while
decreasing the number of calls to the ground truth method. This efficiency is
demonstrated on sampling of kinetic barriers and collective variables in
molecules, and can be extended to any NN potential architecture and materials
system.Comment: 12 pages, 4 figures, supporting informatio
Photocell optimization using dark state protection
This work was supported by the Leverhulme Trust (RPG-080). EMG is supported by the Royal Society of Edinburgh/Scottish Government. RGB thanks Samsung Advanced Institute of Technology for funding. AF thanks the Anglo-Israeli association and the Anglo-Jewish association for funding.Conventional photocells suffer a fundamental efficiency threshold imposed by the principle of detailed balance, reflecting the fact that good absorbers must necessarily also be fast emitters. This limitation can be overcome by "parking" the energy of an absorbed photon in a dark state which neither absorbs nor emits light. Here we argue that suitable dark states occur naturally as a consequence of the dipole-dipole interaction between two proximal optical dipoles for a wide range of realistic molecular dimers. We develop an intuitive model of a photocell comprising two light-absorbing molecules coupled to an idealized reaction centre, showing asymmetric dimers are capable of providing a significant enhancement of light-to-current conversion under ambient conditions. We conclude by describing a road map for identifying suitable molecular dimers for demonstrating this effect by screening a very large set of possible candidate molecules.PostprintPeer reviewe
Automated patent extraction powers generative modeling in focused chemical spaces
Deep generative models have emerged as an exciting avenue for inverse
molecular design, with progress coming from the interplay between training
algorithms and molecular representations. One of the key challenges in their
applicability to materials science and chemistry has been the lack of access to
sizeable training datasets with property labels. Published patents contain the
first disclosure of new materials prior to their publication in journals, and
are a vast source of scientific knowledge that has remained relatively untapped
in the field of data-driven molecular design. Because patents are filed seeking
to protect specific uses, molecules in patents can be considered to be weakly
labeled into application classes. Furthermore, patents published by the US
Patent and Trademark Office (USPTO) are downloadable and have machine-readable
text and molecular structures. In this work, we train domain-specific
generative models using patent data sources by developing an automated pipeline
to go from USPTO patent digital files to the generation of novel candidates
with minimal human intervention. We test the approach on two in-class extracted
datasets, one in organic electronics and another in tyrosine kinase inhibitors.
We then evaluate the ability of generative models trained on these in-class
datasets on two categories of tasks (distribution learning and property
optimization), identify strengths and limitations, and suggest possible
explanations and remedies that could be used to overcome these in practice
From free-energy profiles to activation free energies
Given a chemical reaction going from reactant (R) to the product (P) on a potential energy surface (PES) and a collective variable (CV) discriminating between R and P, we define the free-energy profile (FEP) as the logarithm of the marginal Boltzmann distribution of the CV. This FEP is not a true free energy. Nevertheless, it is common to treat the FEP as the “free-energy” analog of the minimum potential energy path and to take the activation free energy, ΔF‡ RP, as the difference between the maximum at the transition state and the minimum at R. We show that this approximation can result in large errors. The FEP depends on the CV and is, therefore, not unique. For the same reaction, different discriminating CVs can yield different ΔF‡ RP. We derive an exact expression for the activation free energy that avoids this ambiguity. We find ΔF‡ RP to be a combination of the probability of the system being in the reactant state, the probability density on the dividing surface, and the thermal de Broglie wavelength associated with the transition. We apply our formalism to simple analytic models and realistic chemical systems and show that the FEP-based approximation applies only at low temperatures for CVs with a small effective mass. Most chemical reactions occur on complex, high-dimensional PES that cannot be treated analytically and pose the added challenge of choosing a good CV. We study the influence of that choice and find that, while the reaction free energy is largely unaffected, ΔF‡ RP is quite sensitive
- …